Overview

Dataset statistics

 Original DatasetOversampled Dataset
Number of variables99
Number of observations188862
Missing cells00
Missing cells (%)0.0%0.0%
Duplicate rows0172
Duplicate rows (%)0.0%20.0%
Total size in memory13.3 KiB67.3 KiB
Average record size in memory72.7 B80.0 B

Variable types

 Original DatasetOversampled Dataset
Categorical44
Numeric55

Alerts

Original DatasetOversampled Dataset
time is highly overall correlated with distanceAlert not present in High Correlation
line_width is highly overall correlated with roughnessAlert not present in High Correlation
roughness is highly overall correlated with line_widthAlert not present in High Correlation
distance is highly overall correlated with timeAlert not present in High Correlation
ink_visco_cp is highly overall correlated with surface_tension_dyne_cm and 1 other fieldsAlert not present in High Correlation
surface_tension_dyne_cm is highly overall correlated with ink_visco_cp and 1 other fieldsAlert not present in High Correlation
ink _density is highly overall correlated with ink_visco_cp and 1 other fieldsAlert not present in High Correlation
overspray has 8 (4.3%) zeros overspray has 18 (2.1%) zeros Zeros
Alert not present in Dataset has 172 (20.0%) duplicate rowsDuplicates
Alert not present in distance has a high cardinality: 93 distinct values High Cardinality
Alert not present in ink_visco_cp has a high cardinality: 213 distinct values High Cardinality
Alert not present in surface_tension_dyne_cm has a high cardinality: 213 distinct values High Cardinality
Alert not present in ink _density has a high cardinality: 51 distinct values High Cardinality
Alert not present in distance is highly imbalanced (55.8%) Imbalance
Alert not present in ink_visco_cp is highly imbalanced (57.3%) Imbalance
Alert not present in surface_tension_dyne_cm is highly imbalanced (57.3%) Imbalance
Alert not present in ink _density is highly imbalanced (58.9%) Imbalance

Reproduction

 Original DatasetOversampled Dataset
Analysis started2023-04-24 01:57:27.6680522023-04-24 01:57:31.897946
Analysis finished2023-04-24 01:57:31.8829882023-04-24 01:57:34.604883
Duration4.21 seconds2.71 seconds
Software versionydata-profiling vv4.1.2ydata-profiling vv4.1.2
Download configurationconfig.jsonconfig.json

Variables

distance
Categorical

 Original DatasetOversampled Dataset
Distinct393
Distinct (%)1.6%10.8%
Missing00
Missing (%)0.0%0.0%
Memory size1.6 KiB13.5 KiB
900
139 
300
47 
270
 
2
900
484 
300
174 
911
 
8
887
 
8
270
 
6
Other values (88)
182 

Length

 Original DatasetOversampled Dataset
Max length33
Median length33
Mean length33
Min length33

Characters and Unicode

 Original DatasetOversampled Dataset
Total characters5642586
Distinct characters510
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Original DatasetOversampled Dataset
Unique045 ?
Unique (%)0.0%5.2%

Sample

 Original DatasetOversampled Dataset
1st row270899
2nd row270903
3rd row300904
4th row300900
5th row300888

Common Values

ValueCountFrequency (%)
900 139
73.9%
300 47
 
25.0%
270 2
 
1.1%
ValueCountFrequency (%)
900 484
56.1%
300 174
 
20.2%
911 8
 
0.9%
887 8
 
0.9%
270 6
 
0.7%
902 6
 
0.7%
904 5
 
0.6%
923 5
 
0.6%
890 5
 
0.6%
912 5
 
0.6%
Other values (83) 156
 
18.1%

Length

2023-04-23T19:57:34.665478image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Original Dataset

2023-04-23T19:57:34.767881image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
900 139
73.9%
300 47
 
25.0%
270 2
 
1.1%
ValueCountFrequency (%)
900 484
56.1%
300 174
 
20.2%
911 8
 
0.9%
887 8
 
0.9%
270 6
 
0.7%
902 6
 
0.7%
904 5
 
0.6%
923 5
 
0.6%
890 5
 
0.6%
912 5
 
0.6%
Other values (83) 156
 
18.1%

Most occurring characters

ValueCountFrequency (%)
0 374
66.3%
9 139
 
24.6%
3 47
 
8.3%
2 2
 
0.4%
7 2
 
0.4%
ValueCountFrequency (%)
0 1380
53.4%
9 632
24.4%
3 220
 
8.5%
8 123
 
4.8%
2 71
 
2.7%
1 57
 
2.2%
7 44
 
1.7%
4 24
 
0.9%
6 18
 
0.7%
5 17
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 564
100.0%
ValueCountFrequency (%)
Decimal Number 2586
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 374
66.3%
9 139
 
24.6%
3 47
 
8.3%
2 2
 
0.4%
7 2
 
0.4%
ValueCountFrequency (%)
0 1380
53.4%
9 632
24.4%
3 220
 
8.5%
8 123
 
4.8%
2 71
 
2.7%
1 57
 
2.2%
7 44
 
1.7%
4 24
 
0.9%
6 18
 
0.7%
5 17
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common 564
100.0%
ValueCountFrequency (%)
Common 2586
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 374
66.3%
9 139
 
24.6%
3 47
 
8.3%
2 2
 
0.4%
7 2
 
0.4%
ValueCountFrequency (%)
0 1380
53.4%
9 632
24.4%
3 220
 
8.5%
8 123
 
4.8%
2 71
 
2.7%
1 57
 
2.2%
7 44
 
1.7%
4 24
 
0.9%
6 18
 
0.7%
5 17
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 564
100.0%
ValueCountFrequency (%)
ASCII 2586
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 374
66.3%
9 139
 
24.6%
3 47
 
8.3%
2 2
 
0.4%
7 2
 
0.4%
ValueCountFrequency (%)
0 1380
53.4%
9 632
24.4%
3 220
 
8.5%
8 123
 
4.8%
2 71
 
2.7%
1 57
 
2.2%
7 44
 
1.7%
4 24
 
0.9%
6 18
 
0.7%
5 17
 
0.7%

time
Real number (ℝ)

 Original DatasetOversampled Dataset
Distinct63434
Distinct (%)33.5%50.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean71.13829870.511514
 Original DatasetOversampled Dataset
Minimum3128.356335
Maximum130130
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size1.6 KiB13.5 KiB
2023-04-23T19:57:34.886581image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

 Original DatasetOversampled Dataset
Minimum3128.356335
5-th percentile3433.627823
Q14545
median6969.433682
Q389.2588.686351
95-th percentile111.25108
Maximum130130
Range99101.64366
Interquartile range (IQR)44.2543.686351

Descriptive statistics

 Original DatasetOversampled Dataset
Standard deviation24.6882624.329103
Coefficient of variation (CV)0.347045980.34503731
Kurtosis-0.82393317-0.8470009
Mean71.13829870.511514
Median Absolute Deviation (MAD)21.519.554057
Skewness0.169269310.10926534
Sum1337460780.925
Variance609.51018591.90525
MonotonicityNot monotonicNot monotonic
2023-04-23T19:57:35.047634image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
78 9
 
4.8%
44 9
 
4.8%
66 8
 
4.3%
96 8
 
4.3%
38 8
 
4.3%
63 7
 
3.7%
107 6
 
3.2%
61 6
 
3.2%
83 5
 
2.7%
60 5
 
2.7%
Other values (53) 117
62.2%
ValueCountFrequency (%)
78 24
 
2.8%
44 21
 
2.4%
38 21
 
2.4%
61 20
 
2.3%
63 19
 
2.2%
34 17
 
2.0%
107 17
 
2.0%
96 16
 
1.9%
66 14
 
1.6%
87 14
 
1.6%
Other values (424) 679
78.8%
ValueCountFrequency (%)
31 2
 
1.1%
32 4
2.1%
34 5
2.7%
35 1
 
0.5%
36 2
 
1.1%
37 2
 
1.1%
38 8
4.3%
39 1
 
0.5%
40 4
2.1%
41 2
 
1.1%
ValueCountFrequency (%)
28.35633542 1
 
0.1%
30.10913703 1
 
0.1%
31 5
0.6%
31.14085654 1
 
0.1%
31.1857756 1
 
0.1%
31.421662 1
 
0.1%
31.7004337 1
 
0.1%
31.76776959 1
 
0.1%
32 11
1.3%
32.00413804 1
 
0.1%
ValueCountFrequency (%)
28.35633542 1
 
0.5%
30.10913703 1
 
0.5%
31 5
2.7%
31.14085654 1
 
0.5%
31.1857756 1
 
0.5%
31.421662 1
 
0.5%
31.7004337 1
 
0.5%
31.76776959 1
 
0.5%
32 11
5.9%
32.00413804 1
 
0.5%
ValueCountFrequency (%)
31 2
 
0.2%
32 4
0.5%
34 5
0.6%
35 1
 
0.1%
36 2
 
0.2%
37 2
 
0.2%
38 8
0.9%
39 1
 
0.1%
40 4
0.5%
41 2
 
0.2%

velocity
Real number (ℝ)

 Original DatasetOversampled Dataset
Distinct73447
Distinct (%)38.8%51.9%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean10.46242810.608104
 Original DatasetOversampled Dataset
Minimum6.6676.667
Maximum15.51724115.517241
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size1.6 KiB13.5 KiB
2023-04-23T19:57:35.219656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

 Original DatasetOversampled Dataset
Minimum6.6676.667
5-th percentile6.8186.923
Q18.276758.411
median9.94510.112
Q312.903513.01586
95-th percentile14.913914.896701
Maximum15.51724115.517241
Range8.85024148.8502414
Interquartile range (IQR)4.626754.6048605

Descriptive statistics

 Original DatasetOversampled Dataset
Standard deviation2.63906372.5766508
Coefficient of variation (CV)0.252241990.24289457
Kurtosis-1.181831-1.2068823
Mean10.46242810.608104
Median Absolute Deviation (MAD)2.052.0923557
Skewness0.320231160.26939049
Sum1966.93659144.1854
Variance6.96465736.6391293
MonotonicityNot monotonicNot monotonic
2023-04-23T19:57:35.376416image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.375 12
 
6.4%
6.818 9
 
4.8%
11.538 9
 
4.8%
13.636 6
 
3.2%
7.895 6
 
3.2%
14.754 6
 
3.2%
10.843 5
 
2.7%
15 5
 
2.7%
8.411214953 5
 
2.7%
10.345 5
 
2.7%
Other values (63) 120
63.8%
ValueCountFrequency (%)
9.375 27
 
3.1%
11.538 24
 
2.8%
6.818 21
 
2.4%
14.754 20
 
2.3%
7.895 16
 
1.9%
10.345 14
 
1.6%
8.411214953 14
 
1.6%
10.843 12
 
1.4%
14.28571429 12
 
1.4%
8.333333333 12
 
1.4%
Other values (437) 690
80.0%
ValueCountFrequency (%)
6.667 4
2.1%
6.818 9
4.8%
6.923 2
 
1.1%
6.976744186 1
 
0.5%
6.977 2
 
1.1%
7.142857143 1
 
0.5%
7.143 2
 
1.1%
7.317 1
 
0.5%
7.317073171 1
 
0.5%
7.5 4
2.1%
ValueCountFrequency (%)
6.667 11
1.3%
6.683130991 1
 
0.1%
6.73400987 1
 
0.1%
6.752611279 1
 
0.1%
6.774707324 1
 
0.1%
6.818 21
2.4%
6.842750206 1
 
0.1%
6.85928052 1
 
0.1%
6.869708787 1
 
0.1%
6.923 6
 
0.7%
ValueCountFrequency (%)
6.667 11
5.9%
6.683130991 1
 
0.5%
6.73400987 1
 
0.5%
6.752611279 1
 
0.5%
6.774707324 1
 
0.5%
6.818 21
11.2%
6.842750206 1
 
0.5%
6.85928052 1
 
0.5%
6.869708787 1
 
0.5%
6.923 6
 
3.2%
ValueCountFrequency (%)
6.667 4
0.5%
6.818 9
1.0%
6.923 2
 
0.2%
6.976744186 1
 
0.1%
6.977 2
 
0.2%
7.142857143 1
 
0.1%
7.143 2
 
0.2%
7.317 1
 
0.1%
7.317073171 1
 
0.1%
7.5 4
0.5%

ink_visco_cp
Categorical

 Original DatasetOversampled Dataset
Distinct2213
Distinct (%)1.1%24.7%
Missing00
Missing (%)0.0%0.0%
Memory size1.6 KiB13.5 KiB
6.9
140 
6.3
48 
6.9
491 
6.3
160 
6.909597767929222
 
1
6.292210172530286
 
1
6.919260921007547
 
1
Other values (208)
208 

Length

 Original DatasetOversampled Dataset
Max length318
Median length33
Mean length36.4095128
Min length33

Characters and Unicode

 Original DatasetOversampled Dataset
Total characters5645525
Distinct characters411
Distinct categories22 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Original DatasetOversampled Dataset
Unique0211 ?
Unique (%)0.0%24.5%

Sample

 Original DatasetOversampled Dataset
1st row6.36.909597767929222
2nd row6.36.87655460358194
3rd row6.36.890437650474096
4th row6.36.9
5th row6.36.903606919171564

Common Values

ValueCountFrequency (%)
6.9 140
74.5%
6.3 48
 
25.5%
ValueCountFrequency (%)
6.9 491
57.0%
6.3 160
 
18.6%
6.909597767929222 1
 
0.1%
6.292210172530286 1
 
0.1%
6.919260921007547 1
 
0.1%
6.903671923663459 1
 
0.1%
6.868028799230724 1
 
0.1%
6.928740183047567 1
 
0.1%
6.885948147225283 1
 
0.1%
6.88772432365691 1
 
0.1%
Other values (203) 203
23.5%

Length

2023-04-23T19:57:35.517377image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Original Dataset

2023-04-23T19:57:35.629716image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
6.9 140
74.5%
6.3 48
 
25.5%
ValueCountFrequency (%)
6.9 491
57.0%
6.3 160
 
18.6%
6.883473278255356 1
 
0.1%
6.90325049496801 1
 
0.1%
6.895028195237112 1
 
0.1%
6.880003517882575 1
 
0.1%
6.881905441518298 1
 
0.1%
6.890437650474096 1
 
0.1%
6.903606919171564 1
 
0.1%
6.882529066702848 1
 
0.1%
Other values (203) 203
23.5%

Most occurring characters

ValueCountFrequency (%)
6 188
33.3%
. 188
33.3%
9 140
24.8%
3 48
 
8.5%
ValueCountFrequency (%)
6 1165
21.1%
. 862
15.6%
9 852
15.4%
3 457
 
8.3%
8 360
 
6.5%
7 333
 
6.0%
2 333
 
6.0%
1 309
 
5.6%
5 302
 
5.5%
0 289
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 376
66.7%
Other Punctuation 188
33.3%
ValueCountFrequency (%)
Decimal Number 4663
84.4%
Other Punctuation 862
 
15.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6 188
50.0%
9 140
37.2%
3 48
 
12.8%
ValueCountFrequency (%)
6 1165
25.0%
9 852
18.3%
3 457
 
9.8%
8 360
 
7.7%
7 333
 
7.1%
2 333
 
7.1%
1 309
 
6.6%
5 302
 
6.5%
0 289
 
6.2%
4 263
 
5.6%
Other Punctuation
ValueCountFrequency (%)
. 188
100.0%
ValueCountFrequency (%)
. 862
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 564
100.0%
ValueCountFrequency (%)
Common 5525
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6 188
33.3%
. 188
33.3%
9 140
24.8%
3 48
 
8.5%
ValueCountFrequency (%)
6 1165
21.1%
. 862
15.6%
9 852
15.4%
3 457
 
8.3%
8 360
 
6.5%
7 333
 
6.0%
2 333
 
6.0%
1 309
 
5.6%
5 302
 
5.5%
0 289
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 564
100.0%
ValueCountFrequency (%)
ASCII 5525
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6 188
33.3%
. 188
33.3%
9 140
24.8%
3 48
 
8.5%
ValueCountFrequency (%)
6 1165
21.1%
. 862
15.6%
9 852
15.4%
3 457
 
8.3%
8 360
 
6.5%
7 333
 
6.0%
2 333
 
6.0%
1 309
 
5.6%
5 302
 
5.5%
0 289
 
5.2%
 Original DatasetOversampled Dataset
Distinct2213
Distinct (%)1.1%24.7%
Missing00
Missing (%)0.0%0.0%
Memory size1.6 KiB13.5 KiB
32.3
140 
30.9
48 
32.3
491 
30.9
160 
32.35628987700318
 
1
30.938845924874386
 
1
32.26575526389176
 
1
Other values (208)
208 

Length

 Original DatasetOversampled Dataset
Max length418
Median length44
Mean length47.2703016
Min length44

Characters and Unicode

 Original DatasetOversampled Dataset
Total characters7526267
Distinct characters511
Distinct categories22 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Original DatasetOversampled Dataset
Unique0211 ?
Unique (%)0.0%24.5%

Sample

 Original DatasetOversampled Dataset
1st row30.932.35628987700318
2nd row30.932.281981516487974
3rd row30.932.278331835047496
4th row30.932.3
5th row30.932.284442353737056

Common Values

ValueCountFrequency (%)
32.3 140
74.5%
30.9 48
 
25.5%
ValueCountFrequency (%)
32.3 491
57.0%
30.9 160
 
18.6%
32.35628987700318 1
 
0.1%
30.938845924874386 1
 
0.1%
32.26575526389176 1
 
0.1%
32.36777590272072 1
 
0.1%
32.27836696945391 1
 
0.1%
32.308064267306214 1
 
0.1%
32.304756248394845 1
 
0.1%
32.28503124123783 1
 
0.1%
Other values (203) 203
23.5%

Length

2023-04-23T19:57:35.713061image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Original Dataset

2023-04-23T19:57:35.803292image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
32.3 140
74.5%
30.9 48
 
25.5%
ValueCountFrequency (%)
32.3 491
57.0%
30.9 160
 
18.6%
32.36274053364617 1
 
0.1%
32.26934184247805 1
 
0.1%
32.33727500945055 1
 
0.1%
32.33752823600665 1
 
0.1%
32.361704761946086 1
 
0.1%
32.278331835047496 1
 
0.1%
32.284442353737056 1
 
0.1%
32.24348408032512 1
 
0.1%
Other values (203) 203
23.5%

Most occurring characters

ValueCountFrequency (%)
3 328
43.6%
. 188
25.0%
2 140
18.6%
0 48
 
6.4%
9 48
 
6.4%
ValueCountFrequency (%)
3 1686
26.9%
2 953
15.2%
. 862
13.8%
0 476
 
7.6%
9 465
 
7.4%
6 330
 
5.3%
8 310
 
4.9%
1 307
 
4.9%
5 299
 
4.8%
7 292
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 564
75.0%
Other Punctuation 188
 
25.0%
ValueCountFrequency (%)
Decimal Number 5405
86.2%
Other Punctuation 862
 
13.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 328
58.2%
2 140
24.8%
0 48
 
8.5%
9 48
 
8.5%
ValueCountFrequency (%)
3 1686
31.2%
2 953
17.6%
0 476
 
8.8%
9 465
 
8.6%
6 330
 
6.1%
8 310
 
5.7%
1 307
 
5.7%
5 299
 
5.5%
7 292
 
5.4%
4 287
 
5.3%
Other Punctuation
ValueCountFrequency (%)
. 188
100.0%
ValueCountFrequency (%)
. 862
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 752
100.0%
ValueCountFrequency (%)
Common 6267
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 328
43.6%
. 188
25.0%
2 140
18.6%
0 48
 
6.4%
9 48
 
6.4%
ValueCountFrequency (%)
3 1686
26.9%
2 953
15.2%
. 862
13.8%
0 476
 
7.6%
9 465
 
7.4%
6 330
 
5.3%
8 310
 
4.9%
1 307
 
4.9%
5 299
 
4.8%
7 292
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 752
100.0%
ValueCountFrequency (%)
ASCII 6267
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 328
43.6%
. 188
25.0%
2 140
18.6%
0 48
 
6.4%
9 48
 
6.4%
ValueCountFrequency (%)
3 1686
26.9%
2 953
15.2%
. 862
13.8%
0 476
 
7.6%
9 465
 
7.4%
6 330
 
5.3%
8 310
 
4.9%
1 307
 
4.9%
5 299
 
4.8%
7 292
 
4.7%

ink _density
Categorical

 Original DatasetOversampled Dataset
Distinct251
Distinct (%)1.1%5.9%
Missing00
Missing (%)0.0%0.0%
Memory size1.6 KiB13.5 KiB
1614
140 
1517
48 
1614
519 
1517
173 
1613
 
21
1615
 
15
1612
 
13
Other values (46)
121 

Length

 Original DatasetOversampled Dataset
Max length44
Median length44
Mean length44
Min length44

Characters and Unicode

 Original DatasetOversampled Dataset
Total characters7523448
Distinct characters510
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Original DatasetOversampled Dataset
Unique025 ?
Unique (%)0.0%2.9%

Sample

 Original DatasetOversampled Dataset
1st row15171614
2nd row15171612
3rd row15171610
4th row15171614
5th row15171612

Common Values

ValueCountFrequency (%)
1614 140
74.5%
1517 48
 
25.5%
ValueCountFrequency (%)
1614 519
60.2%
1517 173
 
20.1%
1613 21
 
2.4%
1615 15
 
1.7%
1612 13
 
1.5%
1616 13
 
1.5%
1611 9
 
1.0%
1609 7
 
0.8%
1515 6
 
0.7%
1610 6
 
0.7%
Other values (41) 80
 
9.3%

Length

2023-04-23T19:57:35.877370image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Original Dataset

2023-04-23T19:57:35.976545image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
1614 140
74.5%
1517 48
 
25.5%
ValueCountFrequency (%)
1614 519
60.2%
1517 173
 
20.1%
1613 21
 
2.4%
1615 15
 
1.7%
1612 13
 
1.5%
1616 13
 
1.5%
1611 9
 
1.0%
1609 7
 
0.8%
1518 6
 
0.7%
1617 6
 
0.7%
Other values (41) 80
 
9.3%

Most occurring characters

ValueCountFrequency (%)
1 376
50.0%
6 140
 
18.6%
4 140
 
18.6%
5 48
 
6.4%
7 48
 
6.4%
ValueCountFrequency (%)
1 1690
49.0%
6 651
 
18.9%
4 525
 
15.2%
5 266
 
7.7%
7 181
 
5.2%
2 34
 
1.0%
3 32
 
0.9%
0 29
 
0.8%
9 20
 
0.6%
8 20
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 752
100.0%
ValueCountFrequency (%)
Decimal Number 3448
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 376
50.0%
6 140
 
18.6%
4 140
 
18.6%
5 48
 
6.4%
7 48
 
6.4%
ValueCountFrequency (%)
1 1690
49.0%
6 651
 
18.9%
4 525
 
15.2%
5 266
 
7.7%
7 181
 
5.2%
2 34
 
1.0%
3 32
 
0.9%
0 29
 
0.8%
9 20
 
0.6%
8 20
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common 752
100.0%
ValueCountFrequency (%)
Common 3448
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 376
50.0%
6 140
 
18.6%
4 140
 
18.6%
5 48
 
6.4%
7 48
 
6.4%
ValueCountFrequency (%)
1 1690
49.0%
6 651
 
18.9%
4 525
 
15.2%
5 266
 
7.7%
7 181
 
5.2%
2 34
 
1.0%
3 32
 
0.9%
0 29
 
0.8%
9 20
 
0.6%
8 20
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 752
100.0%
ValueCountFrequency (%)
ASCII 3448
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 376
50.0%
6 140
 
18.6%
4 140
 
18.6%
5 48
 
6.4%
7 48
 
6.4%
ValueCountFrequency (%)
1 1690
49.0%
6 651
 
18.9%
4 525
 
15.2%
5 266
 
7.7%
7 181
 
5.2%
2 34
 
1.0%
3 32
 
0.9%
0 29
 
0.8%
9 20
 
0.6%
8 20
 
0.6%

line_width
Real number (ℝ)

 Original DatasetOversampled Dataset
Distinct100162
Distinct (%)53.2%18.8%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean229.3617252.35847
 Original DatasetOversampled Dataset
Minimum112112
Maximum391457
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size1.6 KiB13.5 KiB
2023-04-23T19:57:36.109070image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

 Original DatasetOversampled Dataset
Minimum112112
5-th percentile179183
Q1194208
median222.5253
Q3260294
95-th percentile305.65322.95
Maximum391457
Range279345
Interquartile range (IQR)6686

Descriptive statistics

 Original DatasetOversampled Dataset
Standard deviation43.83363151.6546
Coefficient of variation (CV)0.191111380.20468741
Kurtosis0.331877710.25096912
Mean229.3617252.35847
Median Absolute Deviation (MAD)31.543
Skewness0.575501920.38003583
Sum43120217533
Variance1921.38722668.1977
MonotonicityNot monotonicNot monotonic
2023-04-23T19:57:36.293840image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
191 5
 
2.7%
194 4
 
2.1%
183 4
 
2.1%
193 4
 
2.1%
224 4
 
2.1%
203 4
 
2.1%
232 4
 
2.1%
204 4
 
2.1%
207 4
 
2.1%
185 4
 
2.1%
Other values (90) 147
78.2%
ValueCountFrequency (%)
303 17
 
2.0%
321 14
 
1.6%
305 14
 
1.6%
306 13
 
1.5%
218 13
 
1.5%
194 12
 
1.4%
225 12
 
1.4%
191 12
 
1.4%
224 12
 
1.4%
203 12
 
1.4%
Other values (152) 731
84.8%
ValueCountFrequency (%)
112 1
 
0.5%
123 1
 
0.5%
142 1
 
0.5%
163 1
 
0.5%
167 1
 
0.5%
176 1
 
0.5%
177 1
 
0.5%
178 1
 
0.5%
179 3
1.6%
180 2
1.1%
ValueCountFrequency (%)
112 3
 
0.3%
123 3
 
0.3%
142 1
 
0.1%
163 3
 
0.3%
167 2
 
0.2%
176 3
 
0.3%
177 2
 
0.2%
178 3
 
0.3%
179 8
0.9%
180 5
0.6%
ValueCountFrequency (%)
112 3
 
1.6%
123 3
 
1.6%
142 1
 
0.5%
163 3
 
1.6%
167 2
 
1.1%
176 3
 
1.6%
177 2
 
1.1%
178 3
 
1.6%
179 8
4.3%
180 5
2.7%
ValueCountFrequency (%)
112 1
 
0.1%
123 1
 
0.1%
142 1
 
0.1%
163 1
 
0.1%
167 1
 
0.1%
176 1
 
0.1%
177 1
 
0.1%
178 1
 
0.1%
179 3
0.3%
180 2
0.2%

overspray
Real number (ℝ)

 Original DatasetOversampled Dataset
Distinct119265
Distinct (%)63.3%30.7%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean104.83511139.60441
 Original DatasetOversampled Dataset
Minimum00
Maximum415423
Zeros818
Zeros (%)4.3%2.1%
Negative00
Negative (%)0.0%0.0%
Memory size1.6 KiB13.5 KiB
2023-04-23T19:57:36.467602image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

 Original DatasetOversampled Dataset
Minimum00
5-th percentile13
Q11633
median5999
Q3169241
95-th percentile341.6372
Maximum415423
Range415423
Interquartile range (IQR)153208

Descriptive statistics

 Original DatasetOversampled Dataset
Standard deviation110.08344122.40498
Coefficient of variation (CV)1.05006270.87679878
Kurtosis0.28180928-0.86703726
Mean104.83511139.60441
Median Absolute Deviation (MAD)4987
Skewness1.13859650.63833792
Sum19709120339
Variance12118.36314982.978
MonotonicityNot monotonicNot monotonic
2023-04-23T19:57:36.636821image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 8
 
4.3%
10 5
 
2.7%
91 5
 
2.7%
3 5
 
2.7%
7 4
 
2.1%
47 4
 
2.1%
32 4
 
2.1%
220 4
 
2.1%
24 3
 
1.6%
5 3
 
1.6%
Other values (109) 143
76.1%
ValueCountFrequency (%)
0 18
 
2.1%
91 15
 
1.7%
7 14
 
1.6%
10 12
 
1.4%
201 11
 
1.3%
11 10
 
1.2%
3 10
 
1.2%
34 10
 
1.2%
32 10
 
1.2%
220 10
 
1.2%
Other values (255) 742
86.1%
ValueCountFrequency (%)
0 8
4.3%
1 3
 
1.6%
2 3
 
1.6%
3 5
2.7%
4 1
 
0.5%
5 3
 
1.6%
6 1
 
0.5%
7 4
2.1%
8 2
 
1.1%
9 1
 
0.5%
ValueCountFrequency (%)
0 18
2.1%
1 9
1.0%
2 9
1.0%
3 10
1.2%
4 4
 
0.5%
5 9
1.0%
6 4
 
0.5%
7 14
1.6%
8 6
 
0.7%
9 4
 
0.5%
ValueCountFrequency (%)
0 18
9.6%
1 9
4.8%
2 9
4.8%
3 10
5.3%
4 4
 
2.1%
5 9
4.8%
6 4
 
2.1%
7 14
7.4%
8 6
 
3.2%
9 4
 
2.1%
ValueCountFrequency (%)
0 8
0.9%
1 3
 
0.3%
2 3
 
0.3%
3 5
0.6%
4 1
 
0.1%
5 3
 
0.3%
6 1
 
0.1%
7 4
0.5%
8 2
 
0.2%
9 1
 
0.1%

roughness
Real number (ℝ)

 Original DatasetOversampled Dataset
Distinct94131
Distinct (%)50.0%15.2%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean98.037234112.28422
 Original DatasetOversampled Dataset
Minimum4343
Maximum192228
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size1.6 KiB13.5 KiB
2023-04-23T19:57:36.855211image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

 Original DatasetOversampled Dataset
Minimum4343
5-th percentile58.3563.05
Q17582
median91112
Q3117.25142
95-th percentile152.65164
Maximum192228
Range149185
Interquartile range (IQR)42.2560

Descriptive statistics

 Original DatasetOversampled Dataset
Standard deviation30.76604334.582315
Coefficient of variation (CV)0.313819980.30798909
Kurtosis0.11304357-0.8400436
Mean98.037234112.28422
Median Absolute Deviation (MAD)1930
Skewness0.733424340.12559766
Sum1843196789
Variance946.549411195.9365
MonotonicityNot monotonicNot monotonic
2023-04-23T19:57:37.120547image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
77 8
 
4.3%
68 8
 
4.3%
73 6
 
3.2%
99 6
 
3.2%
84 6
 
3.2%
75 5
 
2.7%
85 5
 
2.7%
108 5
 
2.7%
72 5
 
2.7%
117 4
 
2.1%
Other values (84) 130
69.1%
ValueCountFrequency (%)
77 25
 
2.9%
68 21
 
2.4%
145 18
 
2.1%
118 17
 
2.0%
99 17
 
2.0%
108 16
 
1.9%
84 16
 
1.9%
71 15
 
1.7%
73 15
 
1.7%
85 15
 
1.7%
Other values (121) 687
79.7%
ValueCountFrequency (%)
43 1
0.5%
44 1
0.5%
45 1
0.5%
48 2
1.1%
49 2
1.1%
54 1
0.5%
57 1
0.5%
58 1
0.5%
59 1
0.5%
60 1
0.5%
ValueCountFrequency (%)
43 3
0.3%
44 3
0.3%
45 4
0.5%
46 2
 
0.2%
48 3
0.3%
49 5
0.6%
50 1
 
0.1%
51 2
 
0.2%
54 4
0.5%
56 1
 
0.1%
ValueCountFrequency (%)
43 3
1.6%
44 3
1.6%
45 4
2.1%
46 2
 
1.1%
48 3
1.6%
49 5
2.7%
50 1
 
0.5%
51 2
 
1.1%
54 4
2.1%
56 1
 
0.5%
ValueCountFrequency (%)
43 1
0.1%
44 1
0.1%
45 1
0.1%
48 2
0.2%
49 2
0.2%
54 1
0.1%
57 1
0.1%
58 1
0.1%
59 1
0.1%
60 1
0.1%

Interactions

Original Dataset

2023-04-23T19:57:30.796794image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:33.899957image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:28.493301image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:32.026078image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:29.064110image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:32.533352image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:29.658291image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:32.961375image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:30.266556image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:33.448790image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:30.899102image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:33.989287image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:28.628939image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:32.115738image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:29.196043image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:32.618624image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:29.768293image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:33.068608image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:30.364007image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:33.532421image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:31.005519image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:34.077732image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:28.710700image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:32.263482image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:29.315766image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:32.698203image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:29.894673image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:33.162505image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:30.483509image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:33.621299image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:31.135503image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:34.188237image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:28.816294image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:32.356097image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:29.451585image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:32.789124image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:30.031934image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:33.261021image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:30.590393image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:33.717745image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:31.227002image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:34.285889image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:28.961955image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:32.440581image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:29.535447image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:32.871219image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:30.160070image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:33.350699image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Original Dataset

2023-04-23T19:57:30.669374image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Oversampled Dataset

2023-04-23T19:57:33.804132image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2023-04-23T19:57:37.319194image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
timevelocityline_widthoversprayroughnessdistanceink_visco_cpsurface_tension_dyne_cmink _density
time1.0000.023-0.042-0.067-0.1220.6870.2750.2750.275
velocity0.0231.0000.3000.0620.1360.4820.2780.2780.278
line_width-0.0420.3001.0000.2900.6190.0000.0000.0000.000
overspray-0.0670.0620.2901.0000.2290.0000.1980.1980.198
roughness-0.1220.1360.6190.2291.0000.2020.2710.2710.271
distance0.6870.4820.0000.0000.2021.0000.1510.1510.151
ink_visco_cp0.2750.2780.0000.1980.2710.1511.0000.9860.986
surface_tension_dyne_cm0.2750.2780.0000.1980.2710.1510.9861.0000.986
ink _density0.2750.2780.0000.1980.2710.1510.9860.9861.000

Missing values

Original Dataset

2023-04-23T19:57:31.448071image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.

Oversampled Dataset

2023-04-23T19:57:34.432329image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.

Original Dataset

2023-04-23T19:57:31.828959image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Oversampled Dataset

2023-04-23T19:57:34.550283image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Original Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness
027034.07.9416.330.9151729412164
127034.07.9416.330.91517261136141
230038.07.8956.330.9151721811103
330044.06.8186.330.915171901568
430041.07.3176.330.915171909190
530040.07.5006.932.31614180062
630038.07.8956.932.316141788082
730043.06.9776.330.9151718524145
830043.06.9776.330.9151721350161
930034.08.8246.330.915173238171

Oversampled Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness
089962.21449914.0420996.90959832.3562901614285103118
190361.87344614.5712946.87655532.2819821612284105119
290464.22397314.1728916.89043832.2783321610286111116
390070.41167013.1471056.90000032.300000161428794118
488861.52692014.3588776.90360732.2844421612285114117
590086.48193610.7170316.90000032.300000161428710693
692393.4969429.6559956.88252932.243484161329110588
791494.9969849.7507366.88609132.261477161928810784
890493.0247319.5182236.90806432.288811160828710987
990092.7319149.7671866.90000032.300000161428710686

Original Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness
178900108.08.3333336.932.3161421228272
17990093.09.6774196.932.3161432347157
18090093.09.6774196.932.31614305201108
18190094.09.5744686.932.3161428810785
18290095.09.4736846.932.3161429024115
18390096.09.3750006.932.316142621794
18490096.09.3750006.932.316142411586
18590096.09.3750006.932.316141917787
186900108.08.3333336.932.31614188173
187900107.08.4112156.932.31614203545

Oversampled Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness
175900107.08.4112156.932.316141947276
176900107.08.4112156.932.31614204573
177900108.08.3333336.932.316141918199
178900108.08.3333336.932.3161421228272
17990093.09.6774196.932.3161432347157
18090093.09.6774196.932.31614305201108
18190094.09.5744686.932.3161428810785
18590096.09.3750006.932.316141917787
186900108.08.3333336.932.31614188173
187900107.08.4112156.932.31614203545

Duplicate rows

Original Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness# duplicates
Dataset does not contain duplicate rows.

Oversampled Dataset

distancetimevelocityink_visco_cpsurface_tension_dyne_cmink _densityline_widthoversprayroughness# duplicates
027034.07.9416.330.915172611361413
127034.07.9416.330.91517294121643
230031.09.6776.330.915172181871173
430032.09.3756.330.915173061021453
530032.09.3756.932.31614183107843
730032.09.3756.932.316142542681313
830034.08.8246.330.9151722627953
930034.08.8246.330.9151732381713
1030034.08.8246.932.316142192131853
1130036.08.3336.932.3161419341713